New AGI Test Stumps AI Models, Exposes Limits in General Intelligence

3 min read The ARC-AGI-2 benchmark, created by the Arc Prize Foundation, tested AI’s true general intelligence, with top models scoring just 1%, while humans averaged 60%. Unlike past tests, it blocks brute-force tactics and demands real reasoning. Even OpenAI’s o3 (low), which excelled before, only hit 4% with high compute costs. To drive progress, the Arc Prize 2025 offers a reward for any model reaching 85% accuracy efficiently, raising the question: How close are we to AGI? March 25, 2025 12:50 New AGI Test Stumps AI Models, Exposes Limits in General Intelligence

The Arc Prize Foundation, co-founded by AI researcher François Chollet, has unveiled ARC-AGI-2, a new, ultra-challenging benchmark designed to test AI models’ true general intelligence. The results? Most leading models have failed spectacularly.

How Did AI Models Perform?

According to the Arc Prize leaderboard, top reasoning models like OpenAI’s o1-pro and DeepSeek’s R1 barely managed 1%-1.3% accuracy. Other powerful models, including GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash, scored around 1%.

In contrast, human participants, averaging 60% accuracy, outperformed AI models by a wide margin.

What Makes ARC-AGI-2 So Hard?

Unlike traditional AI benchmarks, ARC-AGI-2 presents adaptive, pattern-recognition challenges requiring AI to generate correct responses without prior exposure. Unlike its predecessor, ARC-AGI-1, this new test:
🔹 Prevents brute-force computing from inflating scores
🔹 Emphasizes efficiency, measuring how AI acquires and applies new skills
🔹 Eliminates reliance on memorization, forcing models to think on the fly

Even OpenAI’s o3 (low) model, which dominated ARC-AGI-1 with 75.7% accuracy, scored just 4% on ARC-AGI-2—and only after burning $200 in compute per task.

The Need for Better AI Benchmarks

Tech leaders, including Hugging Face’s Thomas Wolf, argue that AI development lacks rigorous benchmarks for creativity and general intelligence. ARC-AGI-2 may be the first real step in addressing that gap.

To push AI research forward, the Arc Prize Foundation has launched the Arc Prize 2025 challenge, offering a prize for any model that achieves 85% accuracy on ARC-AGI-2 while spending just $0.42 per task.

With AI models struggling to pass this test, the question looms: How far are we from real artificial general intelligence?

User Comments (0)

Add Comment
We'll never share your email with anyone else.

img